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Abstract. - Complex networks are used to depict topological features of complex systems. The 
structure of a network characterizes the interactions among elements of the system, and facilitates 
the study of many dynamical processes taking place on it. In previous investigations, the topolog- 
ical infrastructure underlying dynamical systems is simplified as a static and invariable skeleton. 
However, this assumption cannot cover the temporal features of many time-evolution networks, 
whose components are evolving and mutating. In this letter, utilizing the log data of WiFi users in 
a Chinese university campus, we infuse the temporal dimension into the construction of dynamical 
human contact network. By quantitative comparison with the traditional aggregation approach, 
we find that the temporal contact network differs in many features, e.g., the reachability, the path 
length distribution. We conclude that the correlation between temporal path length and duration 
is not only determined by their definitions, but also influenced by the micro-dynamical features 
of human activities under certain social circumstance as well. The time order of individuals' in- 
teraction events plays a critical role in understanding many dynamical processes via human close 
proximity interactions studied in this letter. Besides, our study also provides a promising measure 
to identify the potential superspreaders by distinguishing the nodes functioning as the relay hub. 
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Introduction. — Complex network theory has been 
widely applied in many fields of science and engineering 
[TJ[2] ■ Many complex systems are characterized by models 
of complex networks, where nodes represent the cells of 
systems, and edges describe the interactions among them. 
Recent years, the study of dynamical processes on com- 
plex network has received extensive attention [3], e.g. epi- 
demic spread 0HZ], population dynamics [EHH], an d Inter- 
net packet routing [TUJ[TT]. In previous investigations, the 
topological infrastructure underlying dynamical systems 
is generally simplified as a static and invariable skeleton, 
where nodes and edges are assumed to be permanent en- 
tities (although the edge weights might change dynami- 
cally). This simplification stems from the fact that de- 
tailed temporal information of structure evolution is un- 
touchable according to some technology deficiency, or the 
variation of network structure is not frequent enough to in- 
fluence the dynamical processcs(e.g., the relation between 
the highway network and the commuting traffic). Ob- 
viously, this assumption cannot cover the temporal fea- 
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tures of many time-evolution networks, whose components 
are evolving and mutating |12H16j . Taking human con- 
tact networks [P7H20] for example, people do not keep 
on interacting with their neighbors perpetually, and the 
time ordering is intrinsically embedded in the human close 
proximity interactions(CPIs)(in other words, each individ- 
ual does not simultaneously contact all the neighbors, the 
times when the interactions are active determine the tem- 
poral sequence). Traditionally, a network is regarded as 
the underlying infrastructure of dynamical processes. In- 
troducing the temporal dimension may help us infuse the 
temporal information about dynamical events into the net- 
work construction [T^hTF| . 

Recently, the analysis of temporal features of complex 
communication networks, especially the human contact 
networks, have received a boost with the advance of infor- 
mation technology, e.g., wireless communication. Newly 
created digital instruments not only reshape our daily life, 
but also record tremendous digital data produced by hu- 
man daily activities, which can be utilized to analyze hu- 
man behaviors [T7h20) . WiFi, as a ubiquitous wireless 
accessing technology, has been widely deployed in human 
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daily circumstances: Actually, the "WiFi-Free" sign can 
be found in nearly every corner of urban areas, and the 
notion of "WiFi-City" becomes reality. The commercial 
WiFi system provides us a powerful tool to collect dig- 
ital traces of a huge population without artificial inter- 
ference. In this letter, with the digital traces automati- 
cally recorded by the WiFi control system of a Chinese 
university campus, we construct a contact network of hu- 
man CPIs from the temporal perspective, which mainly 
embraces the concept of vector clocks widely used in dis- 
tributed computing |21j . By quantitative comparison with 
the traditional aggregation networks, we find that the tem- 
poral contact network differs in many features, e.g., the 
reachability, the path length distributions. We conclude 
that the correlation between temporal path length and du- 
ration is not only determined by their definitions, but also 
influenced by the micro-dynamical features of human ac- 
tivities under certain social circumstance (we use two more 
datasets to check this result). The time order of inter- 
action events plays an important role in understanding 
many dynamical processes via human CPIs, e.g., epidemic 
spread, information diffusion. Besides, our study provides 
a promising measure to identify the potential superspread- 
ers by distinguishing the nodes functioning as the relay 
hub. 

Background and data description. The WiFi 
system in this study is deployed at the Handan cam- 
pus of Fudan University in Shanghai, China, where all 
administrative, athletic, academic and teaching buildings 
have been covered by wireless access points (WAPs) . This 
system automatically records users' access-related events, 
which are marked by the information pertaining to the 
Media Access Control (MAC) address of users' mobile de- 
vices, the online and offline time, and the MAC address 
of WAPs. We mainly select WiFi data recorded from all 
the 6 teaching buildings for two reasons: (i) the 6 teaching 
buildings are open to all the campus members because al- 
most all curricula are scheduled here, while other buildings 
are limited or partially limited to certain campus mem- 
bers; (ii) in the 6 public teaching buildings, WiFi users sel- 
dom leave their mobile devices for a long period, whereas 
it is regular that a machine is still connected to the WAP 
though the user has already left the "secured" office. Our 
study uses the data from 1 4th October to 25th November, 
covering about 14000 WiFi users. 

In order to extract the information of CPIs from the 
data, an assumption is put forward at first: any WiFi de- 
vice seeing the same AP infers a close indoor proximity 
interaction among the devices owners |22j . The intrinsic 
reason of this assumption lies in the fact that in a close 
indoor circumstance people have a high probability to di- 
rectly communicate with each other or have some relation- 
ship via indirect interactions. For instance, a respiratory 
disease, e.g., the SARS ( the survival period of the SARS 
virus is about 24 hours, which is long enough to infect 
people in a close indoor circumstance) might propagate 



from person to person; a computer virus might bypass the 
communication protocols and spread from one device to 
others through the WAP or other peer to peer manners 
such as Bluetooth [23] . 

We define = (Vi,tn,ti2) as an interaction event(IE), 
where Vi denotes the set of users connecting to the same 
WAP simultaneously, and any user v € V, interacts with 
others in Vi during the time period from tn to ta- Accord- 
ing to this treatment, we translate the WiFi access-related 
logs into human interaction events. 

We first partition the whole dataset into several subsets, 
each of which contains human activities at each week in 
the observed month. Counting the number of WiFi users 
at each hour of any subset, we find that there are circa- 
dian patterns underlying the contact activities. As shown 
in FigJH the average amount of WiFi users from Mon- 
day to Friday is evidently more than that in the weekend. 
This phenomenon indicates that users' WiFi accessing be- 
haviors are in accordance with the university weekly cur- 
riculum schedules. Strikingly, we also observe that the 
amount of WiFi users intensely fluctuates as time elapses 
on each workday, while this fluctuation has been largely 
weakened at weekends. During the lunch and dinner time 
on a workday, most people go to the dining hall, thus few 
still keep online. Meanwhile, students and teachers always 
need to change classrooms at the break time according to 
the workday curriculum schedules. In the night of work- 
days, students commonly stay in classrooms with their 
laptops to do their homework or for entertainment. Since 
most of their mobile devices get connected to the WAPs, 
the users' number peaks at night (Tuesday and Friday are 
exceptional as there is no class arranged at Tuesday af- 
ternoon, and many native students are homeward at Fri- 
day night). Therefore, the fluctuation of the number of 
WiFi users indicates that the natural rhythm of individ- 
uals' daily activities is in accordance with the university 
daily schedules. The teaching buildings are open from 
7AM to 11PM, and people have to leave these building 
before 11PM. Therefore, we only take into account the 
IEs occurring in the 16 opening hours every day hereafter. 
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Fig. 1: Number of WiFi users in each hour in the week(19th 
October to 25th October). 



From the traditional aggregated network perspective, 
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each WiFi user is denoted as a node, and an edge links 
any two nodes if there is at least one IE that involves them 
both taking place under the observation epoch. Each edge 
is weighted by the number of contacts (or the total con- 
tact duration). The strength of a node is the sum of the 
weight of edges departing from the node. Aggregating all 
the IEs in each week, we can build the aggregated con- 
tact network(ACN) at the weekly level. Fig J5] presents the 
node degree distribution of weekly aggregated networks 
averaged over all weeks. The distribution is exponentially 
distributed, indicating that most users only have a limited 
number of contacts, which has also been reported in |19j . 
We alter the time window of the aggregation process(e.g., 
one day, one month), and find that the exponential trend 
is robust. 
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Fig. 2: The degree distribution of the ACN with a time win- 
dow at the weekly level. The distribution is averaged over all 
weeks.. 



Constructing the temporal contact network. 

From the temporal perspective, we first define a tempo- 
ral contact(TC): user i makes a TC with j if there is a 
sequence of IEs with non-decreasing time, linking i and j 
[24] . Because time creates its own dimension in the events 
sequences, the TCs are directed. For instance, in FigJ3l 
user a interacts with b before b interacts with c. There 
is only a TC from a to c, and c cannot make a TC with 
a, due to the fact that nothing(e.g., information) could 
be propagated from c to a via b. In a temporal contact 
nctwork(TCN), an edge links any two nodes if they have 
at least one TC. 

Before a given observation time i, there might exist 
many TCs linking users i, j. We define 4>ij(t) as the time 
of the inception of the latest TC from i to j before t. Con- 
sidering that a new TC from i to j is created at time t' , we 
define the temporal path duration as Tjj (t') = t' — (f>i_j (t'), 
which measures the time interval consumed by the cor- 
responding temporal path. Before the given observation 
time t, the temporal path length 0ij{t) is defined as the 
shortest length of all the latest TCs(the least number of 
IEs of all the latest TCs). Therefore, the edges in the 
TCN can be weighted by the temporal path length or du- 



ration(The detailed algorithms of calculating 4>ij(t) and 
9ij (t) are given in Appendix) . 

Analysis. 

Reachability and Path Lengths. In the ACN, the size 
Ni of the component that any given node i belongs to indi- 
cates an upper bound of the number of nodes that can be 
influenced by the spreading dynamics(e.g, virus transmis- 
sion) launched from i. Ni characterizes the reachability 
of node i in the aggregated network. Employing the ca- 
sual timing of the IEs to construct the temporal version 
of the contact network, we first measure network reach- 
ability. With any given node i, we denote the number 
of nodes that can be temporally contacted by i within a 
given observation period At = t2 — 1\ as the reachability of 
i [25) , which is equal to the number of elements in the set 
{4>i,j(fo) I ^ijfe) > h j S V}. The average(maximum) 
network reachability is denoted by the mean(maximum) 
value of all nodes' reachability. 




time 



to-ta 

@ — d 



(b) 



(c) 



tn C 



Fig. 3: (color online) Schematic illustration of the TCN and 
ACN construction, (a) WiFi accessing logs are translated into 
the corresponding IEs. The black arrowed lines show the tem- 
poral contacts between individuals in different IEs. (b) The 
construction of TCN. The edges are colored according to the 
latest time that they are updated, (c) The construction of 
ACN at the observed duration to ~ ts 



FigH] compares the network reachability(normalized by 
the network size) between the TCN and ACN with the 
same observation period. Both the average and maximum 
value show that the ACN's reachability is larger, espe- 
cially as At — > 0. The reachability of ACN quickly attains 
its saturation as At increases, while the saturation of the 
reachability of TCN is much slower. The saturated value 
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of the average reachability of the TCN is much smaller 
than that of the ACN. Hence the temporal dimension pro- 
vides a tighter upper bound for the network reachability. 
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Fig. 4: (a) Comparison of the average reachability between the 
ACN and TCN. (b) Comparison of the maximal reachability 
between the ACN and TCN. 



We also compare the distribution of the shortest path 
lengths of the ACN with that of the TCN. In the ACN, the 
shortest path length d a denotes the topological distance 
between any given source and destination, while, in the 
TCN, the temporal version of the shortest path length d* 
denotes the least number of the IEs of a given fastest TC. 
We find that the distribution of d f is broader than that 
of d a , and the mean value of d* is larger. Besides, the 
maximum of d l is about two times larger than that of d a . 

The difference in reachability and path length manifests 
the discrepancy of the structure between ACN and TCN. 
In the following, we mainly study the features of TCN. 
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Fig. 5: (color online)The correlation of the temporal path dura- 
tion r and duration 9 for (a) the WiFi system under study, (b) 
the SG exhibition, (c) the HT09 conference. In panel FigO^a), 
the conditional probability p(6\r) is averaged at the weekly 
level. 



The correlation between path 's length and duration. In 
the ACN, the time spent on each edge can be regarded as 
identical, thus it is evident that the longer the path length 
is, the more time it takes to build the path. In the TCN, 
the time consumed by the fastest TCs from person to per- 
son is characterized by the temporal path duration r, and 
the least number of IEs involved in the TCs is denoted by 
the temporal path length 9. Whether or not the positive 
correlation between path length and duration in the ACN 
can also be found in the TCN remains unclear. Figj5ja) 
presents the conditional probability distribution p{9\r) of 
the WiFi deployment, which does not shows a positive 
correlation, but reveals the fact that considerable fluctua- 
tions of 6 exist with any given r. For instance, when we 
fix TCs' temporal path duration to 1 day, their temporal 
path lengths vary dramatically in 10 4 orders. The Spear- 
man correlation coefficient p between 8 and r is 0.14, in- 
dicating that the strong positive correlation between path 
length and duration is absent in the TCN. 

Particularly, the definition of temporal path length and 
duration require us to select the fastest TCs with different 
inception time before measuring the length or duration. If 
there are several TCs pointing to a given destination node, 
one path duration may different from others though their 
path length is equivalent. It is also possible that in a given 
time period there exist more than one path between a 
given source and destination(thcir temporal path duration 
is equivalent) due to the fact that distinct sequences of 
IEs link thcm(individuals may present in many IEs in the 
given time period) |20| . 

Furthermore, the correlation between temporal path 
length and duration is not only determined by their def- 
initions, but is also influenced by the micro-dynamical 
features of human activities under certain social cir- 
cumstance. To examine the impact of social circum- 
stances to the correlation feature, we utilize the lon- 
gitudinal data on human face-to-face proximity contact 
in confcrence(HT09) and museum(SG exhibition) [TO] 
collected by active Radio Frequency Identification De- 
vices(RFID). The datasets are provided by the SocioPat- 
terns Project) http://www.sociopatterns.org/| more de- 



tails sec Appendix) . The attendees in the HT09 conference 
shape a closed population like the university members in 
our study, while the visitors in the SG exhibition seldom 
repeat their visit day after day. The absence of strong 
correlation between the temporal path length and dura- 
tion in HT09 confcrence(Fig|5tc), p = 0.45) is similar to 
that of FigJSJa), whereas the temporal path length ob- 
viously increases with the growth of the temporal path 
duration in the SG exhibition(Fig|5];b), p = 0.80). The 
distinct correlation results from the fact that the social 
circumstance between HT09 and SG is different: In the 
SG exhibition, visitors typically spend a limit time pe- 
riod on each site, and touch different locations following 
a rather pre-defined route; in the HT09 conference, most 
attendees stay on-site during the entire program (a few 
days), move at will among limited areas such as conference 
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hall, corridor for coffer breaks and so on. Our university 
members and the HT09 conference attendees shape closed 
populations. Individuals can encounter with each other 
more frequently than the scenario of an 'open' system like 
the SG museum exhibition. Moreover, in a closed popu- 
lation, individuals seldom quit the system, thus they can 
contact with each other itcratively, whereas the visitors in 
the SG exhibition form an 'open' circumstance with a flux 
of individuals streaming through the sighting, and they 
seldom return once leave. Therefore, the temporal path 
length elongates with the growth of path duration in the 
SG exhibition. 

The correlation of the temporal out- degrees and in- 
degrees. In the TCN, the out-degree d out ^ of any node 
i quantifies the number of receptors temporally affected 
by i, while the in-degree di n ,i specifies the number of its 
potential inciters. The correlation of d ou t and di n can be 
employed to distinguish the role of nodes in a dynami- 
cal system [2l)l[2"rT| . Fig|5Ia)-(f) present the joint probabil- 
ity distribution C At (d out .i, di n ,i) with different observation 
periods At. In each subgraph, we uniformly partition the 
whole dataset into subsets with different At, measure the 
joint probability distribution of each subset, and average 
them to calculate C At (d out ^,di n ^). 
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Fig. 6: (color online)The joint probability distribution of d out 
and di n . When At is larger that 7 days, a set of data points 
always presents at the right-up corner. At is (a) 1 day, (b) 2 
days, (c) 3 days, (d) 5 days, (e) 7 days, (f) 8 days. 



With small At (At = 1 day in Figjf^a)), almost all data 



points reside in the lower triangular matrix, and many 
data points are even sticking on the axes. It can be ex- 
plained by the fact that in a short observation period, e.g., 
1 day, only a limited number of users keep frequently on- 
line all the time, while most individuals act according to 
their curriculum schedules, which seldom cover the whole 
day. If some users access the WAPs in the morning, they 
can temporally reach others logging online later in that 
day according to the above assumption and definitions, 
and they are free from the temporal contacts of other sub- 
sequent users; if some people use WiFi at night, they can 
be temporally influenced by other day time users (recall 
the fluctuations we observed in Fig. I). Occasionally, there 
are some individuals using WiFi both in the morning and 
at night, but they do not inevitably become nodes with 
high in-degree and out-degree, because the value of degree 
also depends on the number of nodes they directly interact 
with in the observation period. Therefore, a very few indi- 
viduals have a high potential to become nodes with large 
in- and out-degrees in a short observation duration. As At 
increases, the times that each user appears in the dataset 
also grow. The forward and backward temporal influence 
of each user increases as time proceeds, and the limit of 
finite observation period is gradually crippled. There are 
more and more data points emerging in the upper trian- 
gular matrix of Fig|6fb)-(d)(At : 2 days— > 6 days) .When 
At ^ 8 days, the distribution of data points constantly 
generates two evidently nontrivial clusters: one contains 
the data scattering along the diagonal, and the other con- 
tains the data anchoring in the upper right corner. They 
are mainly composed of the frequently accessing users. 
There are also two obvious clusters of data points sticking 
on the axes, which contain the users having less frequent 
connections. Though possessing large in-degree or out- 
degree, they are actually trivial hubs because they lack 
the efficiency of transferring information or viruses. In 
essence, the individuals presenting in the two nontrivial 
clusters, particularly, those in the upper right corner in 
FigJBJ can play the role as relay hubs that are critical to 
spreading processes. Actually, when At is small, there still 
exist a small amount of relay hubs(for example, see those 
in the right corner of Figj6jb)), and the similar pattern 
of the joint probability distributions with At beyond 7 
days might result from the weekly cycle of the curriculum 
schedules. 

To examine whether or not each individual's role is in- 
variable in the ACN and TCN, we rank individuals' de- 
gree in these two versions of contact networks, respec- 
tively. We calculate the Kendall's tau coefficient Tk of 
the two rank series. Taking At = 7 days as an example, 
the Kendall's tau coefficient between the ranking of in- 
degree(out-degrec) in TCN and the ranking of degree in 
ACN is just 0.33(0.30) on average. It indicates that indi- 
viduals' role may be different between the ACN and TCN. 
Nodes with high degrees in the ACN may be nodes with 
only hign in-degrees or out-degrees, and many 'leaf indi- 
viduals in the ACN are actually relay hubs in the TCN, 
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whose role is often underestimated in the ACN analysis 
18,20]. The nodes that play the role as the relay hub 
in the TCN bear high force of infection, therefore they 
are more probable to act as superspreaders. The iden- 
tification of influential spreaders is essential in designing 
optimal containment strategies |27| . and here we provide 
a direct method to identify potential superspreaders. 

Conclusions. — In this letter, we take advantage of 
the human digital traces automatically collected by the 
WiFi control system in a Chinese university campus. We 
map users' CPIs into TCNs. We analyze the infrastruc- 
ture of the constructed dynamical system characterizing 
the CPIs among WiFi users from the perspective of tem- 
poral networks. This treatment infuses the temporal or- 
dering of the WiFi users' interaction events into the net- 
work construction. By quantitative comparison with the 
aggregated contact networks, wc uncover that the tem- 
poral contact network differs in many features, e.g., the 
reachability and the path length distribution. We find 
that the correlation between temporal path length and 
duration is not only determined by their definitions, but 
also influenced by the micro-dynamical features of human 
activities under social circumstances. Besides, we also pro- 
vide a direct method to identify potential superspreaders, 
which does deserve further study in detail in future works, 
and may help us design more efficient containment plans. 

* * * 

We thank the SocioPatterns Project for sharing their 
high resolution data on human face-to-face proximity con- 
tact. This study is supported by National Key Basic 
Research and Development Program (No. 201 0CB73I403), 
the NCET program (No.NCET-09-0317) of China. 

Appendix. 

Calculation of 4>(t) and 6{t). To calculate <j>(t) and 
0(t), wc modify the vector clocks based event-driven al- 
gorithm |21| by passing through all IEs ordered by time. 
Take an IE ei = {Vi, tn, tgi) as an example, we specify the 
main steps of the algorithm as follows: 
Step 1: Initialize the information about the nodes par- 
ticipating in the current IE. 

0u,u(<)=*»2, Qu,u(t)=o, V u eVi. 

Step 2: Compare the vector clocks of the nodes in the 
IE, to find the latest temporal information. 

T(v) = max{(j) U)V (t) | Vu 6 Vi} 
Step 3: Find the shortest path length of the updating 
TCs: L(v) = min{9 UiV (t) | Vit G V, and 4> UjV (t) = T(u)}. 
Step 4: Update the temporal path information as 

VueVi,e UiV (t)=L(v) + l,ii(f> u , v (t)^T(v) or 6 u . v {t) ji 
L(v). VueVi,(l> u , v (t) = T(v). 

The update of <j>(t) indicates the creating of a new TC, and 
the update of the corresponding element in O(i) records 
the shortest path length of the new TC. 
Step 5: Process the next IE and return to Step 1. 



Because each IE maintains a certain duration time, one 
IE often takes place before the end of other IEs. So we 
split users' WiFi logs into shorter ones, to keep the time 
sequence of all concurrent IEs. 

The SocioPatterns project. 'SocioPatterns' is an inter- 
disciplinary research collaboration that uses wireless tech- 
nology to gather longitudinal data on human face-to-face 
proximity. It uses the data-driven approach to uncover 
fundamental patterns in social dynamics and human ac- 
tivity. The datasets we use are: The daily contact logs 
collected during the Infectious SocioPatterns event at the 
Science Gallery(SG) in Dublin; human face-to-face prox- 
imity data of about 110 conference attendees during the 
ACM Hypertext 2009(HT09) conference. 
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